Search results for "Constrained clustering"

showing 9 items of 9 documents

SMART: Unique splitting-while-merging framework for gene clustering

2014

© 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number …

Clustering algorithmsMicroarrayslcsh:MedicineGene ExpressionBioinformaticscomputer.software_genreCell SignalingData MiningCluster Analysislcsh:ScienceFinite mixture modelOligonucleotide Array Sequence AnalysisPhysicsMultidisciplinarySMART frameworkConstrained clusteringCompetitive learning modelBioassays and Physiological AnalysisMultigene FamilyCanopy clustering algorithmEngineering and TechnologyData miningInformation TechnologyGenomic Signal ProcessingAlgorithmsResearch ArticleSignal TransductionComputer and Information SciencesFuzzy clusteringCorrelation clusteringResearch and Analysis MethodsClusteringMolecular GeneticsCURE data clustering algorithmGeneticsGene RegulationCluster analysista113Gene Expression Profilinglcsh:RBiology and Life SciencesComputational BiologyCell BiologyDetermining the number of clusters in a data setComputingMethodologies_PATTERNRECOGNITIONSplitting-merging awareness tactics (SMART)Signal ProcessingAffinity propagationlcsh:QGene expressionClustering frameworkcomputer
researchProduct

Structural clustering of millions of molecular graphs

2014

We propose an algorithm for clustering very large molecular graph databases according to scaffolds (i.e., large structural overlaps) that are common between cluster members. Our approach first partitions the original dataset into several smaller datasets using a greedy clustering approach named APreClus based on dynamic seed clustering. APreClus is an online and instance incremental clustering algorithm delaying the final cluster assignment of an instance until one of the so-called pending clusters the instance belongs to has reached significant size and is converted to a fixed cluster. Once a cluster is fixed, APreClus recalculates the cluster centers, which are used as representatives for…

Clustering high-dimensional dataFuzzy clusteringTheoretical computer sciencek-medoidsComputer scienceSingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreComplete-linkage clusteringGraphHierarchical clusteringComputingMethodologies_PATTERNRECOGNITIONData stream clusteringCURE data clustering algorithmCanopy clustering algorithmFLAME clusteringAffinity propagationData miningCluster analysiscomputerk-medians clusteringClustering coefficientProceedings of the 29th Annual ACM Symposium on Applied Computing
researchProduct

Clustering categorical data: A stability analysis framework

2011

Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation …

Computer sciencebusiness.industrySingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreMachine learningDetermining the number of clusters in a data setData stream clusteringCURE data clustering algorithmConsensus clusteringData miningArtificial intelligenceCluster analysisbusinesscomputer2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
researchProduct

Prototype-based learning on concept-drifting data streams

2014

Data stream mining has gained growing attentions due to its wide emerging applications such as target marketing, email filtering and network intrusion detection. In this paper, we propose a prototype-based classification model for evolving data streams, called SyncStream, which dynamically models time-changing concepts and makes predictions in a local fashion. Instead of learning a single model on a sliding window or ensemble learning, SyncStream captures evolving concepts by dynamically maintaining a set of prototypes in a new data structure called the P-tree. The prototypes are obtained by error-driven representativeness learning and synchronization-inspired constrained clustering. To ide…

Data streamConcept driftbusiness.industryComputer scienceData stream miningConstrained clusteringcomputer.software_genreData structureMachine learningEnsemble learningSynchronization (computer science)Data miningArtificial intelligencebusinesscomputerProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
researchProduct

Distance-constrained data clustering by combined k-means algorithms and opinion dynamics filters

2014

Data clustering algorithms represent mechanisms for partitioning huge arrays of multidimensional data into groups with small in–group and large out–group distances. Most of the existing algorithms fail when a lower bound for the distance among cluster centroids is specified, while this type of constraint can be of help in obtaining a better clustering. Traditional approaches require that the desired number of clusters are specified a priori, which requires either a subjective decision or global meta–information knowledge that is not easily obtainable. In this paper, an extension of the standard data clustering problem is addressed, including additional constraints on the cluster centroid di…

Fuzzy clusteringCorrelation clusteringSingle-linkage clusteringConstrained clusteringcomputer.software_genreDetermining the number of clusters in a data setSettore ING-INF/04 - AutomaticaData clustering k–means Opinion dynamics Hegelsmann–Krause modelCURE data clustering algorithmData miningCluster analysisAlgorithmcomputerk-medians clusteringMathematics22nd Mediterranean Conference on Control and Automation
researchProduct

Scalable Clustering by Iterative Partitioning and Point Attractor Representation

2016

Clustering very large datasets while preserving cluster quality remains a challenging data-mining task to date. In this paper, we propose an effective scalable clustering algorithm for large datasets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large datasets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the…

Fuzzy clusteringGeneral Computer ScienceComputer scienceSingle-linkage clusteringCorrelation clusteringConstrained clustering02 engineering and technologycomputer.software_genreComputingMethodologies_PATTERNRECOGNITIONData stream clusteringCURE data clustering algorithm020204 information systems0202 electrical engineering electronic engineering information engineeringCanopy clustering algorithm020201 artificial intelligence & image processingData miningCluster analysiscomputerACM Transactions on Knowledge Discovery from Data
researchProduct

A Novel Clustering Algorithm based on a Non-parametric "Anti-Bayesian" Paradigm

2015

The problem of clustering, or unsupervised classification, has been solved by a myriad of techniques, all of which depend, either directly or implicitly, on the Bayesian principle of optimal classification. To be more specific, within a Bayesian paradigm, if one is to compare the testing sample with only a single point in the feature space from each class, the optimal Bayesian strategy would be to achieve this based on the distance from the corresponding means or central points in the respective distributions. When this principle is applied in clustering, one would assign an unassigned sample into the cluster whose mean is the closest, and this can be done in either a bottom-up or a top-dow…

Fuzzy clusteringbusiness.industryComputer scienceCorrelation clusteringConstrained clusteringPattern recognitioncomputer.software_genreData stream clusteringCURE data clustering algorithmCanopy clustering algorithmAffinity propagationArtificial intelligenceData miningbusinessCluster analysiscomputer
researchProduct

Distributed Data Clustering via Opinion Dynamics

2015

We provide a distributed method to partition a large set of data in clusters, characterized by small in-group and large out-group distances. We assume a wireless sensors network in which each sensor is given a large set of data and the objective is to provide a way to group the sensors in homogeneous clusters by information type. In previous literature, the desired number of clusters must be specified a priori by the user. In our approach, the clusters are constrained to have centroids with a distance at least ε between them and the number of desired clusters is not specified. Although traditional algorithms fail to solve the problem with this constraint, it can help obtain a better cluste…

Theoretical computer scienceArticle SubjectComputer Networks and Communicationsbusiness.industryComputer scienceGeneral EngineeringConstrained clusteringPartition (database)lcsh:QA75.5-76.95NETWORKSDetermining the number of clusters in a data setConsensusSettore ING-INF/04 - AutomaticaCONSENSUS PROBLEMSWirelesslcsh:Electronic computers. Computer sciencebusinessCluster analysis
researchProduct

SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm

2014

Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The k…

sparse matrixClustering high-dimensional dataTheoretical computer scienceonline algorithmsComputer scienceSingle-linkage clusteringComplete-linkage clusteringNearest-neighbor chain algorithmConsensus clusteringmemory-efficient clusteringCluster analysisk-medians clusteringGeneral Environmental ScienceSparse matrix:Engineering::Computer science and engineering [DRNTU]k-medoidsDendrogramConstrained clusteringHierarchical clusteringDistance matrixCanopy clustering algorithmGeneral Earth and Planetary SciencesFLAME clusteringHierarchical clustering of networkshierarchical clusteringAlgorithmProcedia Computer Science
researchProduct